The segmentation and labelling of speech databases
نویسنده
چکیده
The term ‘transcription’ may be used to refer to the representation of a text or an utterance as a string of symbols, without any linkage to the acoustic representation of the utterance. This was the pattern followed by speech and text corpus work during the 1980’s, such as the prosodically-transcribed Spoken English Corpus (Knowles et al. 1995). These corpora did not link the symbolic representation with the physical acoustic waveform, and hence were not fully machine-readable. A recent project, MARSEC (Roach et al. 1993), has generated these links for the Spoken English Corpus such that it is now a segmented and labelled database. This is the form that is most useful to researchers in speech and language technology. The types of segments that may be delimited are of various kinds, depending on the purpose forwhich thedatabase is collected. TheGermanPHONDATandVerbmobil-PHONDAT corpora use the CRIL (Computer Representation of Individual Languages) conventions formulated by a working group at the 1991 Kiel convention of the International Phonetic Association. These conventions propose three levels of representation: orthographic, phonetic and narrow phonetic. The orthographic level contains the orthographic representation of the spoken text. The phonetic level specifies the phonetic form of a word in citation form. The narrow phonetic level gives the phonetic labelling of the particular token of the word that was recorded. Amore detailed system of levels of labelling has been proposed by Barry & Fourcin 1992, which includes the above three levels. Each given speech corpus will choose one or more of these levels, which are described in detail below, and which grew out of the SAM project for the major European languages. The format of label (transcription) files varies widely across research institutions. The WAVES format is becoming popular, and has the advantage of being human-readable. It is advisable to use a label file format that can easily be converted to aWAVES label file, for the sake of portability across different systems. During the International Conference on Spoken Language Processing (ICSLP) in Banff in 1992, a workshop was held on Orthographic and Phonetic Transcription. The workshop goals were to agree on areas where community-wide conventions are needed, to identify
منابع مشابه
A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling
In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...
متن کاملSLAM: segmentation and labelling automatic module
An interactive Segmentation and Labelling Automatic Module (SLAM), especially developed for Windows-based Personal Computers, is described. The system is extremely user-friendly and it was designed with the aim of supporting speech scientists in assessing the very heavy and time-consuming task of segmenting a big amount of speech material such as that caused by the tremendous spread of new and ...
متن کاملA Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling
In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...
متن کاملCluster-Based Image Segmentation Using Fuzzy Markov Random Field
Image segmentation is an important task in image processing and computer vision which attract many researchers attention. There are a couple of information sets pixels in an image: statistical and structural information which refer to the feature value of pixel data and local correlation of pixel data, respectively. Markov random field (MRF) is a tool for modeling statistical and structural inf...
متن کاملAAYUDHA: A Tool for Automatic Segmentation and Labelling of Continuous Tamil Speech
Speech! An effective way of communication between human is now becoming an alternative way to communicate between human and machine. This alternative way is now-a-days used in many real time systems for faster, easier and comfortable response and communication. Speech segmentation and labelling are the process that lay as a key to decide the accuracy of several speech related research. A tool ‘...
متن کاملمراحل و نحوه ی تهیه ی دادگان های صوتی هجایی و دایفونی برای سامانه ی تبدیل متن به گفتار فارسی
Abstract Speech databases are part of the concatenative text to speech synthesis systems. Phonetic quality of the databases plays a significant role in the naturalness of the synthesized speech. This paper introduces two syllable and diphone speech databases for Persian and investigates the way of their development and their specifications and their advantages to each other. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995